BigQuery Magic Commands and DML

The examples in this notebook introduce features of BigQuery Standard SQL and BigQuery SQL Data Manipulation Language (beta). BigQuery Standard SQL is compliant with the SQL 2011 standard. You've already seen the use of the magic command %%bq in the Hello BigQuery and BigQuery Commands notebooks. This command and others in the Google Cloud Datalab API support BigQuery Standard SQL.

Using the BigQuery Magic command with Standard SQL

First, we will cover some more uses of the %%bq magic command. Let's define a query to work with:


In [54]:
%%bq query --name UniqueNames2013
WITH UniqueNames2013 AS
(SELECT DISTINCT name
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE Year = 2013)
SELECT * FROM UniqueNames2013

Now let's list all available commands to work with %%bq


In [55]:
%%bq -h


usage: bq [-h]
          {datasets,tables,query,execute,extract,sample,dryrun,udf,datasource,load}
          ...

Execute various BigQuery-related operations. Use "%bq <command> -h" for help
on a specific command.

positional arguments:
  {datasets,tables,query,execute,extract,sample,dryrun,udf,datasource,load}
                        commands
    datasets            Operations on BigQuery datasets
    tables              Operations on BigQuery tables
    query               Create or execute a BigQuery SQL query object,
                        optionally using other SQL objects, UDFs, or external
                        datasources. If a query name is not specified, the
                        query is executed.
    execute             Execute a BigQuery SQL query and optionally send the
                        results to a named table. The cell can optionally
                        contain arguments for expanding variables in the
                        query.
    extract             Extract a query or table into file (local or GCS)
    sample              Display a sample of the results of a BigQuery SQL
                        query. The cell can optionally contain arguments for
                        expanding variables in the query, if -q/--query was
                        used, or it can contain SQL for a query.
    dryrun              Execute a dry run of a BigQuery query and display
                        approximate usage statistics
    udf                 Create a named Javascript BigQuery UDF
    datasource          Create a named Javascript BigQuery external data
                        source
    load                Load data from GCS into a BigQuery table. If creating
                        a new table, a schema should be specified in YAML or
                        JSON in the cell body, otherwise the schema is
                        inferred from existing table.

optional arguments:
  -h, --help            show this help message and exit
None

The dryrun argument in %%bq can be helpful to confirm the syntax of the SQL query. Instead of executing the query, it will only return some statistics:


In [56]:
%%bq dryrun -q UniqueNames2013


Out[56]:

Dry run information: 88MB to process, results cached

Now, let's get a small sample of the results using the sample argument in %%bq:


In [57]:
%%bq sample -q UniqueNames2013


Out[57]:
name
Coleton
Amberlee
Anwar
Kennedy
Rainier
Joaquin
Gisela
Elienai
Myra
Jentry

(rows: 10, time: 0.3s, cached, job: job_gVR-rWDXxEkBelTPU4vAuLcJdOs)

Finally, We can use the execute command in %%bq to display the results of our query:


In [58]:
%%bq execute -q UniqueNames2013


Out[58]:
name
Carmelo
Blane
Aryan
Joeziah
Izabell
Kevon
Tsering
Ubaldo
Alyanna
Zahira
Ariely
Brittney
Bj
Rhianna
Elya
Caliana
Airam
Antoni
Santo
Estrellita
Allysa
Zoraya
Romina
Murphy
Yobani

(rows: 9510, time: 0.8s, cached, job: job_Zsm5IL9Cv_63abfTEDv4vv4Q8cc)

Using Google BigQuery SQL Data Manipulation Language

Below, we will demonstrate how to use Google BigQuery SQL Data Manipulation Language (DML) in Datalab.

Preparation

First, let's import the BigQuery module, and create a sample dataset and table to help demonstrate the features of Google BigQuery DML.


In [1]:
import google.datalab.bigquery as bq

In [2]:
# Create a new dataset (this will be deleted later in the notebook)
sample_dataset = bq.Dataset('sampleDML')
if not sample_dataset.exists():
  sample_dataset.create(friendly_name = 'Sample Dataset for testing DML', description = 'Created from Sample Notebook in Google Cloud Datalab')
  sample_dataset.exists()

In [67]:
# To create a table, we need to create a schema for it.
# Its easiest to create a schema from some existing data, so this
# example demonstrates using an example object
fruit_row = {
  'name': 'string value',
  'count': 0
}

sample_table1 = bq.Table("sampleDML.fruit_basket").create(schema = bq.Schema.from_data([fruit_row]), 
                                                          overwrite = True)

Inserting Data

We can add rows to our newly created fruit_basket table by using an INSERT statement in our BigQuery Standard SQL query.


In [68]:
%%bq query
INSERT sampleDML.fruit_basket (name, count)
VALUES('banana', 5),
      ('orange', 10),
      ('apple', 15),
      ('mango', 20)


Out[68]:
countname
15apple
5banana
10orange
20mango

(rows: 4, time: 1.6s, 0B processed, job: job_e--49sI22zCRPH5cBFhr7GdMW10)

You may rewrite the previous query as:


In [69]:
%%bq query
INSERT sampleDML.fruit_basket (name, count)
SELECT * 
FROM UNNEST([('peach', 25), ('watermelon', 30)])


Out[69]:
countname
15apple
5banana
10orange
20mango
25peach
30watermelon

(rows: 6, time: 1.8s, 0B processed, job: job_3KF4KPpcNjz3LNp81JRITBsWfNY)

You can also use a WITH clause with INSERT and SELECT.


In [70]:
%%bq query
INSERT sampleDML.fruit_basket(name, count)
WITH w AS (
  SELECT ARRAY<STRUCT<name string, count int64>>
      [('cherry', 35),
      ('cranberry', 40),
      ('pear', 45)] col
)
SELECT name, count FROM w, UNNEST(w.col)


Out[70]:
countname
15apple
5banana
10orange
20mango
25peach
30watermelon
45pear
35cherry
40cranberry

(rows: 9, time: 1.9s, 0B processed, job: job_EcuLfvkWytctD0zq7O9HO5hjN-A)

Here is an example that copies one table's contents into another. First we will create a new table.


In [71]:
fruit_row_detailed = {
  'name': 'string value',
  'count': 0,
  'readytoeat': False
}
sample_table2 = bq.Table("sampleDML.fruit_basket_detailed").create(schema = bq.Schema.from_data([fruit_row_detailed]), 
                                                                   overwrite = True)

In [72]:
%%bq query
INSERT sampleDML.fruit_basket_detailed (name, count, readytoeat)
SELECT name, count, false
FROM sampleDML.fruit_basket


Out[72]:
countreadytoeatname
20Falsemango
25Falsepeach
35Falsecherry
45Falsepear
30Falsewatermelon
40Falsecranberry
10Falseorange
5Falsebanana
15Falseapple

(rows: 9, time: 1.9s, 146B processed, job: job_WCwTcCKTHFjEHWJVZPaDpMIXfNU)

Updating Data

You can update rows in the fruit_basket table by using an UPDATE statement in the BigQuery Standard SQL query. We will try to do this using the Datalab BigQuery API.


In [73]:
%%bq query
UPDATE sampleDML.fruit_basket_detailed
SET readytoeat = True
WHERE name = 'banana'


Out[73]:
countreadytoeatname
35Falsecherry
10Falseorange
20Falsemango
15Falseapple
30Falsewatermelon
25Falsepeach
45Falsepear
40Falsecranberry
5Truebanana

(rows: 9, time: 1.9s, 155B processed, job: job_v1HJMzfoB_IILWrCJp9PMlLkMr4)

To view the contents of a table in BigQuery, use %%bq tables view command:


In [ ]:
%%bq tables view --name sampleDML.fruit_basket_detailed

Deleting Data

You can delete rows in the fruit_basket table by using a DELETE statement in the BigQuery Standard SQL query.


In [75]:
%%bq query
DELETE sampleDML.fruit_basket
WHERE name in ('cherry', 'cranberry')


Out[75]:
countname
15apple
5banana
10orange
20mango
25peach
30watermelon
45pear

(rows: 7, time: 2.0s, 146B processed, job: job_nEPAwtkWbRaQs-av2VF0M4FrmYY)

Use the following query to delete the corresponding entries in sampleDML.fruit_basket_detailed


In [76]:
%%bq query
DELETE sampleDML.fruit_basket_detailed
WHERE NOT EXISTS
  (SELECT * FROM sampleDML.fruit_basket
  WHERE fruit_basket_detailed.name = fruit_basket.name)


Out[76]:
countreadytoeatname
15Falseapple
10Falseorange
25Falsepeach
20Falsemango
30Falsewatermelon
45Falsepear
5Truebanana

(rows: 7, time: 6.7s, 210B processed, job: job_2hSv9IzLp0tlWk_cOAMshuqwEnY)

Deleting Resources


In [3]:
# Clear out sample resources
sample_dataset.delete(delete_contents = True)